Large Scale Collocation Data and Their Application to Japanese Word Processor Technology

نویسندگان

  • Yasuo Koyama
  • Masako Yasutake
  • Kenji Yoshimura
  • Kosho Shudo
چکیده

Word processors or computers used in Japan employ Japanese input method through keyboard stroke combined with Kana (phonetic) character to Kanji (ideographic, Chinese) character conversion technology. The key factor of Kana-to-Kanji conversion technology is how to raise the accuracy of the conversion through the homophone processing, since we have so many homophonic Kanjis. In this paper, we report the results of our Kana-to-Kanji conversion experiments which embody the homophone processing based on large scale collocation data. It is shown that approximately 135,000 collocations yield 9 .1% raise of the conversion accuracy compared with the prototype system which has no collocation data,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collocations as Word Co-ocurrence Restriction Data - An Application to Japanese Word Processor

Collocations, the combination of specific words are quite useful linguistic resources for NLP in general. The purpose of this paper is to show their usefulness, exemplifying an application to Kanji character decision processes for Japanese word processors. Unlike recent trials of automatic extraction, our collocations were collected manually through many years of intensive investigation of corp...

متن کامل

Comparing Lexical Relationships Observed within Japanese Collocation Data and Japanese Word Association Norms

While large-scale corpora and various corpus query tools have long been recognized as essential language resources, the value of word association norms as language resources has been largely overlooked. This paper conducts some initial comparisons of the lexical relationships observed within Japanese collocation data extracted from a large corpus using the Japanese language version of the Sketc...

متن کامل

Coling 2008 22 nd International Conference on Computational Linguistics

While large-scale corpora and various corpus query tools have long been recognized as essential language resources, the value of word association norms as language resources has been largely overlooked. This paper conducts some initial comparisons of the lexical relationships observed within Japanese collocation data extracted from a large corpus using the Japanese language version of the Sketc...

متن کامل

Solving infinite horizon optimal control problems of nonlinear interconnected large-scale dynamic systems via a Haar wavelet collocation scheme

We consider an approximation scheme using Haar wavelets for solving a class of infinite horizon optimal control problems (OCP's) of nonlinear interconnected large-scale dynamic systems. A computational method based on Haar wavelets in the time-domain is proposed for solving the optimal control problem. Haar wavelets integral operational matrix and direct collocation method are utilized to find ...

متن کامل

Using Collocations and K-means Clustering to Improve the N-pos Model for Japanese IME

Kana-Kanji conversion is known as one of the representative applications of Natural Language Processing (NLP) for the Japanese language. The N-pos model, presenting the probability of a Kanji candidate sequence by the product of bi-gram Part-of-Speech (POS) probabilities and POS-to-word emission probabilities, has been successfully applied in a number of well-known Japanese Input Method Editor ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998